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The development of a mesh topology in multi-node electrocardiogram 
(ECG) monitoring based on the ZigBee protocol still has limitations. When 
more than one active ECG node sends a data stream, there will be incorrect 
data or damage due to a failure of synchronization. The incorrect data will 
affect signal interpretation. Therefore, a mechanism is needed to correct or 
predict the damaged data. In this study, the method of expectation- 
maximization (EM) and regression imputation (RI) was proposed to 
overcome these problems. Real data from previous studies are the main 
modalities used in this study. The ECG signal data that has been predicted is 
then compared with the actual ECG data stored in the main controller 
memory. Root mean square error (RMSE) is calculated to measure system 
performance. The simulation was performed on 13 ECG waves, each of 
them has 1000 samples. The simulation results show that the EM method has 
a lower predictive error value than the RI method. The average RMSE for 
the EM and RI methods is 4.77 and 6.63, respectively. The proposed method 
is expected to be used in the case of multi-node ECG monitoring, especially 
in the ZigBee application to minimize errors. 
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1. INTRODUCTION 


Nowadays, there are advanced progress in applying computing technologies [1]-[8], that have 
significant progress in artificial intelligence. The development of wireless communication media on the 
internet of things application is always followed by the development of protocols to support multiple or 
multiuser access. Multiuser monitoring or control applications have been applied in one of them in the health 
area. This application allows for centralized, fast, easy, remote, and multiuser health monitoring. Health 
parameters that get serious attention are the heart of this refers to the risks posed if not maintained optimally. 
Observation of heart conditions can be done by studying the electrical activity of the heart through an 
electrocardiogram (ECG) [9]-[11]. Previous research by Hadiyoso and Aulia [12], has succeeded in 
designing and implementing an ECG monitoring system for several ZigBee-based user nodes. But in its 
application, there are crucial problems, namely damage or loss of data if more than one active node is 
sending data streams [13]. This problem is likely to occur because of the failure of synchronization between 


the user/end node and the coordinator. 


Estimating missing data is a significant advancement that occurs during the data cleaning stage. 
Numerous studies have demonstrated that improper data management results in inaccurate analysis [14]. 
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Missing data, as indicated by the absence of data items for a subject, can obscure some potentially significant 
information. In practice, missing data has emerged as a significant determinant of data quality. Thus, the 
imputation of the missing value is needed [15]. Missing data is a common weakness in the classification 
problem, and it can cause the prediction system’s output to be ineffective [16], [17]. Ignoring missing data 
has an effect on the analysis’s results [18], [19], the outcomes of learning, as well as the outcomes of 
predictions on the collaborative prediction problem [20]. In quantitative studies, missing data leads to biased 
parameter estimates [21]—[24]. In the predictive model, the selection of methods for handling incorrect data 
missing can affect model performance [22], [25]. Missing data are common in medical research, and if not 
handled properly, they can result in a loss of statistical power and potentially biased results [26]-[28]. The 
standard data collection problems may involve noiseless data. In addition to the presence of noisy data, 
organizations face challenges with the presence of missing data. Missing data will affect extensive data 
collection, so investigating different filtering techniques for large data environments will be extraordinary 
[29]. This proposed study will not discuss or observe for the cause of the problem, rather than how to 
improve or predict the incorrect data with a technique which is commonly used in the case of missing data. 
This is the urgency of the research proposed to provide a reliable telemonitoring system with the smallest 
possible error rate to avoid misinterpretation. 

Several methods have been applied to predict missing data in various applications. In general, 
missing value imputation techniques fall into two categories: Statistical and machine learning-based 
techniques [30]. Expectation-maximization (EM), linear regression (LR), least squares (LS), and mean/mode 
are the four statistical techniques that are most frequently used [31]. The use of EM in the imputation of 
missing data has several advantages including missing data does not need to be ignored so that it can increase 
information for the accuracy of diagnosis [32] and can handle many patterns of missing data [33]. Imputation 
using linear regression results in a small standard deviation [34], although regression imputation is better than 
average imputation but results in biased parameter estimates [35]. Expert methods such as support vector 
machines (SVM) and artificial neural networks (ANN) used in data prediction were also reported in the study 
[36], [37]. However, this method has high computational costs and is complex to be implemented in 
computers with low memory resources. 

The literature study above provides enough knowledge as a basis for the proposed study. Research 
on predictions of missing data using a mathematical approach provides enough evidence to be applied to 
solve problems with ZigBee-based multiuser monitoring implementation. In this study, we applied a method 
to overcome the incorrect data, they are EM and regression linear imputation. This study aims to predict the 
incorrect data and determine the best method between the two proposed methods. Performance analysis is 
done by calculating the root mean square error (RMSE) between the reconstruction data and the actual data 
stored on the microcontroller memory. As a reminder, the rest of the paper is organized as follows: section 2, 
containing an explanation of the data collection and the proposed method for handling missing data. In 
section 3, we explain the results and discussion of the simulations that have been carried out. In section 4 we 
present the conclusions and implications of this study. 


2. MATERIAL AND METHOD 
2.1. Data collection 

ECG data is sourced from previous studies which are real data from the streaming of each user node. 
This multipoint ECG network uses a mesh topology where a coordinator is used to receive data from other 
nodes. The coordinator was connected to the personal computer (PC), then the PC displays the ECG chart 
according to the active node. ECG data from each node is also stored in memory by the microcontroller to be 
used for comparing the performance or comparison in calculating RMSE. Figure 1 is an example of a graph 
of an ECG signal that has errors in nodes C and D. At node C, the graph that is marked shows the data value 
of 0, this causes the ECG waveform to be distorted. Meanwhile, at node D, the chart marked shows that the 
data has risen significantly like a spike (reaching a value of 1024 in decimal). Some of the other point 
samples were also omitted manually randomly to test the robustness of the proposed method. This also 
causes the ECG waveform to be distorted. These conditions become the main problem when there are errors 
in data transmission. This phenomenon is then considered to be a problem of incorrect data. 


2.2. Method for handling missing data 
2.2.1. Regression imputation (RI) 

In the regression model, the observed values are used to calculate missing values. The assumed 
value is then used to fill in the blanks where the missing value was previously present. Like mean imputation, 
this method has the advantage of selecting more information from which to determine the appropriate 
imputation value [38]. A technique known as regression imputation is used to replace missing values with 
predictive values that are estimated primarily based on the data that is available. Missing variables have a 
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significant impact on which regression imputation technique is used [39]. Imputation regression begins by 
calculating the average vector estimation and covariance matrix of data based on sub-matrices that contain 
data without missing values [40]. Following that, the observed data is used to calculate linear regression for 
missing variables. The values obtained from linear regression are then used to replace missing data. 
However, the variance and covariance of data are underestimated using this method [41]. 

Consider the case of univariate non-response, where Y1; ...; YK-1 are completely observed and YK 
are observed for the first r observations but missing for the last n-r observations. Regression imputation is a 
method of calculating the regression of a set of variables YK on Y1; ...; YK-1 fills in the missing values 
using the r complete cases using the regression predictions [40]. 


K-1 


Vix = Bxo-12..K-1 + Bxj-12..K-1Yij 
j=1 


j= 


(1) 


Where, Îkķo.12..g-1 is the intercept and Îkj12-K-1 is the coefficient of Yj in the regression of Yx on Yj; ...; 
Yx.1 based on the r complete cases. 
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Figure 1. Examples of ECG waves that have errors [5] 


2.2.2. Expectation-maximization (EM) imputation 

The maximum possibility is a mathematical procedure for determining one or more statistical model 
parameters for observed data by maximizing the observed probability distribution. The EM algorithm is a 
recursive algorithm for determining the maximum probability estimation parameter in the presence of 
incomplete data. EM Imputation is a two-step iterative procedure that utilizes the maximum probability 
method. We begin by inferring the unobserved values from the expected value (E-step). In the second step, 
the expected value is constructed using the maximum of the initial values (M-step). After that, this EM cycle 
is repeated until the imputation values satisfy the specified convergence criteria. The EM imputation method 
generates an unbiased estimate of the standard parameter [41]-[44]. 
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Specifically, allow to be the most recent estimate of the parameter. It is the E step of the EM 
procedure that determines the expected complete-data loglikelihood if were to assume: Specifically, let 9 
be the current estimate of the parameter@. The expected complete-data loglikelihood is found in the E step of 
EM if @were 0: 


QOO) = | CONS Onis Yans 8 = 0) i 


where, Ynis is the missing data and Y,,, is observed data. The M step in EM establishes 0+”) maximization 
of the expected complete-data loglikelihood (observed for all 0: 


e(attY a) > Q010 ®©) for all 0 6) 


2.3. Root mean square error (RMSE) 

The root mean square error (RMSE) is one parameter that can be used to determine the accuracy of 
continuous variables. In terms of error magnitude, the RMSE is measured. RMSE is defined mathematically 
as the square root of the average square difference between the predicted model and the actual value. In other 
words, RMSE quantifies the degree of congruence between observed and predicted data. The RMSE equation 
as (4): 


RMSE = 
(4) 


where,y; is actual data; 9; is prediction data; and n is number of data. 


3. RESULTS AND DISCUSSION 

Typically, missing data follows discernible patterns. Investigating this pattern is critical for 
identifying instances and variables that contribute to the missing data [43], [45]. Figure 2 shows the pattern 
of missing data on ECG waves which were observed in this study. It is shown that the missing data pattern 
for the ECG dataset is general. To make it easier to understand the missing values patterns that were used for 
the analysis variables, the patterns chart has been greatly enlarged to increase its interpretability. Every 
pattern (row) corresponds to a bunch of cases with a similar pattern of incomplete and complete data. In other 
words, wherever the missing values are located, the patterns or groups of cases are displayed in support 
(i.e., on each variable). For instance, pattern 1 denotes cases with no missing values, whereas pattern 4 
denotes cases with missing values on the ECG3. It is expected that all missing cells and non-missing cells in 
the chart will be contiguous if the data is monotone. The pattern of missing values from the dataset is 
non-monotone and there are several values that ought to be imputed. As a result, the monotonicity of this data 
is not demonstrated, and the monotone method of imputation is not justified in this situation. Some methods 
for dealing with missing data are applicable to any pattern of missing data, whereas other methods are only 
applicable to a specific pattern of missing data. 

This study simulates the sending of ECG data streams from 4 nodes where each is connected to an 
ECG amplifier to record the subject’s ECG waveform. At the same time, data is transmitted to the personal 
computer and stored in memory by the controller unit. The application software will display a signal graph 
and save the data stream in .txt format to match the data which is stored in memory. In this study, 13 subjects 
were involved in recording ECG data through these nodes. ECG data in the decimal format of 1000 samples 
from each subject were observed. RMSE calculations are applied to each proposed method. Subjective 
performance testing is also done by looking at the ECG wave graph. Table 1 shows the RMSE for each 
method used in this research. 

Implementation of handling the incorrect data on ECG data using the EM method produces the 
smallest RMSE value for most attributes, the average RMSE obtained was 4.77. These results indicate that 
the EM method is the best method for handling missing data compared to regression. This can be caused by 
the EM algorithm being introduced to handle missing data with several data loss patterns and is a 
computationally simple method by offering analytical solutions in M-step [32]. The linear regression method 
produces the smallest RMSE value for ECG 4 and ECG 5 because the pattern incorrect in the data tends to be 
monotonous, as can be seen in Figure 2. 
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Missing Value Patterns 


Type 
Nonmissing 
E Missing 


Pattern 


ECG13 ECG7 ECGS ECG2 


ECG 
ECGS ECG10 ECG12 ECG3 ECG1 ECG6 


Variable 


Figure 2. The Incorrect data pattern on ECG dataset 


Table 1. RMSE for each method 


ECG RMSE (EM) RMSE (RI) 
ECG1 6.25 6.28 
ECG2 3.57 5.75 
ECG3 5.74 10.03 
ECG4 3.82 3.78 
ECG5 8.36 6.73 
ECG6 3.58 5.26 
ECG7 3.88 7.50 
ECG8 1.51 2.27 
ECG9 4.76 6.40 
ECG10 6.66 10.73 
ECG11 6.95 8.02 
ECG12 5.49 11.02 
ECG13 1.50 2.42 
Mean 4.77 6.63 


Figure 3(a) shows the ECG waves with incorrect data, Figure 3(b) shows the actual ECG waves, 
Figure 3(c) shows the predicted ECG waves using EM, and meanwhile, Figure 3(d) shows the predicted ECG 
waves using RI. As shown in Figure 3(a), the corrupted data occurs randomly on the observed signal line. 
This event occurs on all ECGs, especially when all nodes are active. Visually, the ECG waves that occur 
incorrect data are seen to have an impact on the appearance of waves such as spikes. This can result in 
reading errors by the application, especially for the heart rate estimation. 

Figures 3(c) and 3(d) show an improvement in the ECG waveform through the two proposed 
method approaches. Visually, both have a similar waveform compared to the actual ECG wave. This happens 
because the value of the prediction error by the two methods is relatively small compared to the actual data. 
A comparison sample of prediction results to actual data that is stored in memory is shown in Table 2. The 
proposed method in this study is able to correct the error sample point ECG signal by predicting it based on 
regression and probalistic approaches. The two proposed methods can be applied to resolve cases of incorrect 
data, typically in multi-node ECG monitoring systems that use ZigBee transmission. Based on the 
performance test results on the two methods, the EM method offers smaller error predictions compared to the 
RI method. However, the RI method has the advantage of simpler computation when applied to computers 
with low memory resources. 
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Figure 3. ECG waves (a) with incorrect data, (b) actual, (c) predicted-EM, and (d) predicted-Reg. imputation 


Table 2. Comparison of actual and predicted data on 5 ECG waves 


Data predicted 

aa aes EM(%error) RI(%error) 

ECG-1 260 259(0.3) 269(3.4) 
262 273(4.1) 306(16.7) 

274 284(3.6) 285(4.01) 

ECG-2 274 288(5.1) 258(5.8) 
266 285(7.1) 296(11.2) 

332 301(9.3) 323(2.7) 

ECG-3 273 307(12.4) 363(32) 
279 268(3.9) 322(15.4) 

296 292(1.3) 393(32.7) 

ECG-4 402 413(2.7) 439(9.2) 

398 405(1.7) 404(1.5) 
418 388(7.1) 397(5.02) 

ECG-5 322 338(4.9) 337(4.6) 

338 336(0.5) 327(3.2) 

315 336 (6.67) 327(3.8) 


4. CONCLUSION 

This study has succeeded in simulating predictions of the incorrect data in the case of multi-node 
ECG telemonitoring based on the ZigBee protocol. Two methods, namely regression imputation and EM are 
applied to overcome this problem. Performance measurement is done by calculating the RMSE between the 
ECG data predicted with the actual data. EM produces a smaller error compared to regression imputation. 
The simulation results show that the EM method has a lower predictive error value than the RI method with 
the average RMSE for the EM and RI methods are 4.77 and 6.63, respectively. However, on ECG 4 and 
ECG 5, the Regression Imputation performance has a smaller RMSE value. This phenomenon can be caused 
by the pattern of incorrect data that tends to be monotonous. Intuitively, the ECG wave from the two 
proposed approaches has a shape like the actual ECG wave. The proposed method in this study is expected to 
be applied to predict data in a multi-node monitoring system if incorrect data problems are encountered. 
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